Info¶

W skrócie:

  • zrobiłem zestaw emebddingów na przedmiotach
  • nowy przedmiot tranformuję na embedding
  • szukam najbardziej podobnego przedmiotu w przygotowanym zestawie
  • przydzielam klasę nowemu przedmiotowi wg tego najbardziej podobnego

Ze szczegółami:

  • podzieliłem excela na dane test (1000 wierszy) i train (reszta, ok 3tys) tak żeby zachwoać proprocje w klasie main
  • z danych train zrobiłem bazę embeddingów:
    • akapit tekstowy złożony z supplier_name, supplier_reference_description i purchase_price
    • model generujące embeddingi to klasyczny sentence-transformers/all-mpnet-base-v2
  • dla każdego wiersza w danych test
    • tworzę analogiczny akapit tekstowy
    • w bazie mebeddingów wybieram najbardziej podbny wg metryki cosine
    • biorę predykcję klasy main
    • zawężam zestaw bazowy/treningowy do wierszy z podaną klasą main
    • szukam jeszcze raz najbardziej podobnego embeddingu i wybeiram klasę sub
    • zawężam zestaw bazowy/treningowy do wierszy z podaną klasą sub i analogicznie szukam kalsy details
    • powtarzam zawężanie i szukanie aby znalaźeć ostatnią klasę level4

Metryka poprawności klasyfikacji:

  • odsetek poprawnie zaklasyfikowanych przedmiotó ze zbioru test

Ograniczenia, błędy:

  • zbiór bazowy/treningowy musi być aktualny w sotsunku do nowych przedmiotów

Importy¶

In [1]:
import pandas as pd
import plotly.graph_objects as go
import plotly.io as pio
from sentence_transformers import SentenceTransformer
from numpy import dot, argmax
from numpy.linalg import norm
from tqdm import tqdm


pio.templates.default = "plotly_dark"
d:\Projects\Black Hippo\eda\venv\Lib\site-packages\tqdm\auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm

Parametry¶

In [11]:
MAIN_CLASSES = [
    "Furniture",
    "Lighting",
    "Home Textiles",
    "Tableware",
    "Decoration",
    "Flowers & Plants"
]
TEST_ROWS = 1000

Utils¶

In [12]:
def train_test_split(raw_df: pd.DataFrame):
    # fill na
    df = raw_df[raw_df["main"].isin(MAIN_CLASSES)]
    for col in ["main", "sub", "detail", "level4"]:
        df[col] = df[col].fillna("Unspecified")
    
    ratios = df["main"].value_counts(normalize=True).to_dict()

    df = df.sample(len(df)) # shuffle data
    test_df = pd.DataFrame()


    for main_class, ratio in ratios.items():
        new_df = df[df["main"] == main_class].sample(int(TEST_ROWS*ratio))
        test_df = pd.concat([test_df, new_df])

    if len(test_df) < TEST_ROWS:
        diff = TEST_ROWS - len(test_df)
        test_df = pd.concat([
            test_df,
            df[~(df["item_id"].isin(test_df["item_id"]))].sample(diff)
        ])

    train_df = df[~(df["item_id"].isin(test_df["item_id"]))]

    return test_df, train_df
In [5]:
def get_embedder(model_id: str) -> SentenceTransformer:
    match model_id:
        case "sentence-transformers/all-mpnet-base-v2":
            return SentenceTransformer("sentence-transformers/all-mpnet-base-v2")
        case _:
            raise ValueError
In [6]:
def generate_embedding_from_text(
        model: SentenceTransformer,
        text_data: list[str]
) -> list[list[float]]:
    results = []
    for x in tqdm(text_data):
        embedding = model.encode([x])[0]
        results.append(embedding)
    return results
In [7]:
def row_to_text_input(df: pd.DataFrame, i: int) -> str:
    text = f"""
    Supplier name = {df["supplier_name"].iloc[i]}
    Product name = {df["supplier_reference_description"].iloc[i]}
    Product price = {df["purchase_price"].iloc[i]}
    """
    return text
In [8]:
def cosine_sim(a, b) -> float:
    return float(dot(a, b)/(norm(a)*norm(b)))
In [9]:
def generate_ratio_df(errors_df: pd.DataFrame, test_df: pd.DataFrame, col: str):
    error_ratios = errors_df[col].value_counts(normalize=True).reset_index().rename(columns={"proportion": "ratio_in_errors"})
    test_ratios = test_df[col].value_counts(normalize=True).reset_index().rename(columns={"proportion": "ratio_in_tests"})
    ratios_df = pd.merge(
        left=error_ratios,
        right=test_ratios,
        on=col,
        how="right"
    ).round(2).fillna(0)
    ratios_df["diff"] = ratios_df["ratio_in_errors"] - ratios_df["ratio_in_tests"]
    print(f'r Pearson Correlation = {round(ratios_df[["ratio_in_errors", "ratio_in_tests"]].corr()["ratio_in_tests"].iloc[0], 3)}')
    return ratios_df

Predykcje¶

In [ ]:
raw_df = pd.read_csv("../resources/item data 2026_AW(Sheet1).csv", sep=",")
In [10]:
embedder = get_embedder("sentence-transformers/all-mpnet-base-v2")
In [13]:
test_df, train_df = train_test_split(raw_df)
C:\Users\bawo\AppData\Local\Temp\ipykernel_1284\3270090587.py:5: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[col] = df[col].fillna("Unspecified")

Embeddingi treninogwe / bazowe¶

In [14]:
text_inputs = [
    row_to_text_input(train_df, i)
    for i in range(len(train_df))
]
base_embeddings = generate_embedding_from_text(
    model=embedder,
    text_data=text_inputs
)
train_df["embedding"] = base_embeddings
100%|██████████| 3054/3054 [02:57<00:00, 17.22it/s]

Embeddingi "nowych" przedmiotów¶

In [15]:
text_inputs = [
    row_to_text_input(test_df, i)
    for i in range(len(test_df))
]
test_embeddings = generate_embedding_from_text(
    model=embedder,
    text_data=text_inputs
)
100%|██████████| 1000/1000 [01:27<00:00, 11.44it/s]

Znajdź najbardziej podobne przedmioty¶

In [16]:
pred_main, pred_sub, pred_detail, pred_level4 = [], [], [], []
for test_idx in tqdm(range(len(test_df))):
    embedding = test_embeddings[test_idx]

    # main prdiction
    sim_scores = [cosine_sim(embedding, x) for x in base_embeddings]
    best_idx = argmax(sim_scores)
    main = train_df["main"].iloc[best_idx]

    # sub prediction
    train_df_selected = train_df[train_df["main"] == main]
    base_embeddings_selected = train_df_selected["embedding"].to_list()
    sim_scores = [cosine_sim(embedding, x) for x in base_embeddings_selected]
    best_idx = argmax(sim_scores)
    sub = train_df_selected["sub"].iloc[best_idx]

    # detail prediction
    train_df_selected = train_df_selected[train_df_selected["sub"] == sub]
    base_embeddings_selected = train_df_selected["embedding"].to_list()
    sim_scores = [cosine_sim(embedding, x) for x in base_embeddings_selected]
    best_idx = argmax(sim_scores)
    detail = train_df_selected["detail"].iloc[best_idx]

    # detail prediction
    train_df_selected = train_df_selected[train_df_selected["detail"] == detail]
    base_embeddings_selected = train_df_selected["embedding"].to_list()
    sim_scores = [cosine_sim(embedding, x) for x in base_embeddings_selected]
    best_idx = argmax(sim_scores)
    level4 = train_df_selected["level4"].iloc[best_idx]
    
    pred_main.append(main)
    pred_sub.append(sub)
    pred_detail.append(detail)
    pred_level4.append(level4)

test_df["pred_main"] = pred_main
test_df["pred_sub"] = pred_sub
test_df["pred_detail"] = pred_detail
test_df["pred_level4"] = pred_level4
100%|██████████| 1000/1000 [01:07<00:00, 14.81it/s]

Oszacuj jakość¶

In [17]:
test_n = len(test_df)
main_success_ratio = len(test_df[test_df["main"] == test_df["pred_main"]]) / test_n
sub_success_ratio = len(test_df[test_df["sub"] == test_df["pred_sub"]]) / test_n
detail_success_ratio = len(test_df[test_df["detail"] == test_df["pred_detail"]]) / test_n
level4_success_ratio = len(test_df[test_df["level4"] == test_df["pred_level4"]]) / test_n
total_success_ratio = len(
    test_df[(
        (test_df["main"] == test_df["pred_main"])
        & (test_df["sub"] == test_df["pred_sub"])
        & (test_df["detail"] == test_df["pred_detail"])
        & (test_df["level4"] == test_df["pred_level4"])
    )]
) / test_n

print("main_success_ratio = ", round(main_success_ratio, 3))
print("sub_success_ratio = ", round(sub_success_ratio, 3))
print("detail_success_ratio = ", round(detail_success_ratio, 3))
print("level4_success_ratio = ", round(level4_success_ratio, 3))
print("total_success_ratio = ", round(total_success_ratio, 3))
main_success_ratio =  0.981
sub_success_ratio =  0.947
detail_success_ratio =  0.919
level4_success_ratio =  0.964
total_success_ratio =  0.907

Wizualziacja¶

In [25]:
fig = go.Figure()

fig.add_trace(
    go.Bar(
        # orientation="h",
        x=[
            "main",
            "sub",
            "detail",
            "level4",
            "total"
        ],
        y=[
            main_success_ratio,
            sub_success_ratio,
            detail_success_ratio,
            level4_success_ratio,
            total_success_ratio
        ],
        text=[
            main_success_ratio,
            sub_success_ratio,
            detail_success_ratio,
            level4_success_ratio,
            total_success_ratio
        ],
        marker_color=[
            "silver", "silver", "silver","silver", "teal"
        ]
    )
)

fig.update_layout(
    title="Successfull predictions",
    width=1000,
    height=600
)

fig.show(renderer="notebook")

Analiza błędów¶

In [19]:
errors_df = test_df[~(
    (test_df["main"] == test_df["pred_main"])
    & (test_df["sub"] == test_df["pred_sub"])
    & (test_df["detail"] == test_df["pred_detail"])
    & (test_df["level4"] == test_df["pred_level4"])
)]

Błędy¶

In [20]:
for i in range(len(errors_df)):
    real_class = f'{errors_df["main"].iloc[i]} / {errors_df["sub"].iloc[i]} / {errors_df["detail"].iloc[i]} / {errors_df["level4"].iloc[i]}'
    pred_class = f'{errors_df["pred_main"].iloc[i]} / {errors_df["pred_sub"].iloc[i]} / {errors_df["pred_detail"].iloc[i]} / {errors_df["pred_level4"].iloc[i]}'
    print(f"Real = {real_class}\nPred = {pred_class}\n\n")
Real = Decoration / Home Accessories / Decorative Objects / Objects
Pred = Decoration / Home Accessories / Decorative Objects / Decorative Letters & Numbers


Real = Decoration / Home Accessories / Decorative Objects / Objects
Pred = Decoration / Decoration Storage / Coat Racks / Unspecified


Real = Decoration / Mirrors / Wall Mirrors / Unspecified
Pred = Decoration / Decorative Materials / Unspecified / Unspecified


Real = Decoration / Decoration Storage / Unspecified / Unspecified
Pred = Decoration / Decoration Storage / Storage Boxes / Unspecified


Real = Decoration / Decoration Storage / Unspecified / Unspecified
Pred = Decoration / Decoration Storage / Storage Boxes / Unspecified


Real = Decoration / Home Accessories / Figurines / Flowers & Plants
Pred = Decoration / Flower Pots & Vases / Flower pots / Unspecified


Real = Decoration / Home Accessories / Figurines / Fantasy
Pred = Home Textiles / Cushions / Cushions & Cushion Covers / Unspecified


Real = Decoration / Garden Accessories / Bird Houses & Cages / Unspecified
Pred = Lighting / Wall Lamps / Unspecified / Unspecified


Real = Decoration / Home Accessories / Other / Garlands & hangers
Pred = Decoration / Home Accessories / Decorative Objects / Objects


Real = Decoration / Home Accessories / Decorative Objects / Objects
Pred = Decoration / Candles & Candle Holders / Tealight Holders / Unspecified


Real = Decoration / Home Accessories / Decorative Objects / Objects
Pred = Decoration / Home Accessories / Figurines / People


Real = Decoration / Clocks / Table Clocks / Unspecified
Pred = Decoration / Clocks / Wall Clocks / Unspecified


Real = Decoration / Unspecified / Unspecified / Unspecified
Pred = Decoration / Home Accessories / Figurines / Animals


Real = Decoration / Home Accessories / Figurines / Animals
Pred = Decoration / Flower Pots & Vases / Flower pots / Unspecified


Real = Decoration / Clocks / Table Clocks / Unspecified
Pred = Decoration / Clocks / Wall Clocks / Unspecified


Real = Decoration / Flower Pots & Vases / Floor Vases / Unspecified
Pred = Decoration / Flower Pots & Vases / Vases / Unspecified


Real = Decoration / Decoration Storage / Unspecified / Unspecified
Pred = Decoration / Decoration Storage / Storage Boxes / Unspecified


Real = Decoration / Flower Pots & Vases / Vases / Unspecified
Pred = Decoration / Flower Pots & Vases / Soliflores / Unspecified


Real = Decoration / Home Accessories / Decorative Objects / Decorative Trees
Pred = Lighting / Decorative Lighting / Decorative Lighting / Unspecified


Real = Decoration / Decoration Storage / Storage Jars / Unspecified
Pred = Decoration / Flower Pots & Vases / Vases / Unspecified


Real = Decoration / Home Accessories / Decorative Objects / Objects
Pred = Decoration / Home Accessories / Figurines / Animals


Real = Decoration / Flower Pots & Vases / Vases / Unspecified
Pred = Decoration / Flower Pots & Vases / Flower pots / Unspecified


Real = Decoration / Wall Decoration / Prints & Posters / Unspecified
Pred = Decoration / Candles & Candle Holders / Tealight Holders / Unspecified


Real = Decoration / Flower Pots & Vases / Vases / Unspecified
Pred = Decoration / Flower Pots & Vases / Floor Vases / Unspecified


Real = Decoration / Home Accessories / Other / Garlands & hangers
Pred = Lighting / Decorative Lighting / Decorative Lighting / Unspecified


Real = Decoration / Home Accessories / Figurines / Animals
Pred = Decoration / Home Accessories / Figurines / Flowers & Plants


Real = Decoration / Candles & Candle Holders / Tealight Holders / Unspecified
Pred = Decoration / Candles & Candle Holders / Hurricane Lights & Lanterns / Unspecified


Real = Decoration / Home Accessories / Figurines / Animals
Pred = Home Textiles / Soft Toys / Unspecified / Unspecified


Real = Decoration / Garden Accessories / Fire Pits, Braziers & Fireplaces / Unspecified
Pred = Lighting / Desk & Table Lamps / Unspecified / Unspecified


Real = Decoration / Home Accessories / Figurines / People
Pred = Home Textiles / Cushions / Cushions & Cushion Covers / Unspecified


Real = Decoration / Wall Decoration / Paintings / Unspecified
Pred = Decoration / Wall Decoration / Unspecified / Unspecified


Real = Decoration / Home Accessories / Decorative Objects / Decorative Trays
Pred = Decoration / Home Accessories / Decorative Objects / Decorative Trees


Real = Decoration / Home Accessories / Figurines / Animals
Pred = Decoration / Home Accessories / Figurines / Flowers & Plants


Real = Decoration / Home Accessories / Decorative Objects / Objects
Pred = Furniture / Tables / Side Tables / Unspecified


Real = Decoration / Wall Decoration / Paintings / Unspecified
Pred = Decoration / Wall Decoration / Prints & Posters / Unspecified


Real = Decoration / Unspecified / Unspecified / Unspecified
Pred = Decoration / Home Accessories / Other / Garlands & hangers


Real = Decoration / Flower Pots & Vases / Vases / Unspecified
Pred = Decoration / Flower Pots & Vases / Floor Vases / Unspecified


Real = Decoration / Home Accessories / Other / Garlands & hangers
Pred = Decoration / Home Accessories / Unspecified / Unspecified


Real = Decoration / Home Accessories / Decorative Objects / Decorative Cones
Pred = Decoration / Home Accessories / Decorative Objects / Decorative Trees


Real = Decoration / Home Accessories / Decorative Objects / Objects
Pred = Decoration / Decorative Materials / Unspecified / Unspecified


Real = Decoration / Home Accessories / Decorative Objects / Objects
Pred = Decoration / Home Accessories / Figurines / Fantasy


Real = Decoration / Home Accessories / Decorative Objects / Objects
Pred = Decoration / Unspecified / Unspecified / Unspecified


Real = Decoration / Flower Pots & Vases / Vases / Unspecified
Pred = Decoration / Flower Pots & Vases / Floor Vases / Unspecified


Real = Decoration / Home Accessories / Figurines / People
Pred = Decoration / Home Accessories / Decorative Objects / Objects


Real = Decoration / Home Accessories / Figurines / Animals
Pred = Decoration / Unspecified / Unspecified / Unspecified


Real = Decoration / Home Accessories / Other / Garlands & hangers
Pred = Decoration / Home Accessories / Figurines / Fantasy


Real = Decoration / Candles & Candle Holders / Tealight Holders / Unspecified
Pred = Decoration / Flower Pots & Vases / Vases / Unspecified


Real = Decoration / Home Accessories / Decorative Objects / Objects
Pred = Decoration / Home Accessories / Figurines / Fantasy


Real = Decoration / Flower Pots & Vases / Flower pots / Unspecified
Pred = Decoration / Flower Pots & Vases / Vases / Unspecified


Real = Decoration / Wall Decoration / Paintings / Unspecified
Pred = Decoration / Photo Frames / Photo Frames / Unspecified


Real = Decoration / Candles & Candle Holders / Tealight Holders / Unspecified
Pred = Decoration / Candles & Candle Holders / Hurricane Lights & Lanterns / Unspecified


Real = Decoration / Home Accessories / Decorative Objects / Objects
Pred = Decoration / Home Accessories / Figurines / Animals


Real = Decoration / Home Accessories / Other / Garlands & hangers
Pred = Decoration / Decorative Materials / Unspecified / Unspecified


Real = Decoration / Flower Pots & Vases / Flower pots / Unspecified
Pred = Decoration / Flower Pots & Vases / Vases / Unspecified


Real = Furniture / Tables / Pedestals / Unspecified
Pred = Decoration / Flower Pots & Vases / Flower pots / Unspecified


Real = Furniture / Tables / Coffee Tables / Unspecified
Pred = Furniture / Tables / Side Tables / Unspecified


Real = Furniture / Storage / Shelving Units / Unspecified
Pred = Decoration / Home Accessories / Decorative Objects / Decorative Trees


Real = Furniture / Storage / Shelving Units / Unspecified
Pred = Furniture / Tables / Coffee Tables / Unspecified


Real = Furniture / Storage / Buffets / Unspecified
Pred = Furniture / Storage / Cabinets & Sideboards / Unspecified


Real = Furniture / Sofas & Armchairs / Armchairs / Unspecified
Pred = Furniture / Chairs / Office Chairs / Unspecified


Real = Furniture / Tables / Side Tables / Unspecified
Pred = Furniture / Tables / Coffee Tables / Unspecified


Real = Furniture / Storage / Bookcases / Unspecified
Pred = Lighting / Desk & Table Lamps / Unspecified / Unspecified


Real = Furniture / Tables / Coffee Tables / Unspecified
Pred = Decoration / Decoration Storage / Storage Boxes / Unspecified


Real = Furniture / Tables / Coffee Tables / Unspecified
Pred = Decoration / Clocks / Wall Clocks / Unspecified


Real = Furniture / Tables / Coffee Tables / Unspecified
Pred = Furniture / Tables / Side Tables / Unspecified


Real = Furniture / Storage / Chests of Drawers / Unspecified
Pred = Lighting / Desk & Table Lamps / Unspecified / Unspecified


Real = Furniture / Tables / Side Tables / Unspecified
Pred = Furniture / Storage / Trolleys / Unspecified


Real = Furniture / Storage / Ladder Shelves / Unspecified
Pred = Furniture / Storage / Wall Shelves / Unspecified


Real = Furniture / Tables / Coffee Tables / Unspecified
Pred = Furniture / Tables / Side Tables / Unspecified


Real = Tableware / Serveware / Teapots & Accessories / Unspecified
Pred = Tableware / Unspecified / Unspecified / Unspecified


Real = Tableware / Serveware / Teapots & Accessories / Unspecified
Pred = Tableware / Dinnerware / Mugs / Unspecified


Real = Tableware / Dinnerware / Mugs / Unspecified
Pred = Tableware / Dinnerware / Bowls / Unspecified


Real = Tableware / Glassware / Wine Glasses / Unspecified
Pred = Tableware / Glassware / Champagne Glasses / Unspecified


Real = Tableware / Serveware / Teapots & Accessories / Unspecified
Pred = Tableware / Unspecified / Unspecified / Unspecified


Real = Tableware / Dinnerware / Mugs / Unspecified
Pred = Tableware / Serveware / Teapots & Accessories / Unspecified


Real = Tableware / Glassware / Wine Glasses / Unspecified
Pred = Tableware / Glassware / Champagne Glasses / Unspecified


Real = Tableware / Wine & Bar Accessories / Decanters & Bottles / Unspecified
Pred = Tableware / Glassware / Drinking Glasses / Unspecified


Real = Tableware / Wine & Bar Accessories / Decanters & Bottles / Unspecified
Pred = Tableware / Glassware / Drinking Glasses / Unspecified


Real = Lighting / Floor Lamps / Unspecified / Unspecified
Pred = Lighting / Desk & Table Lamps / Unspecified / Unspecified


Real = Lighting / Floor Lamps / Unspecified / Unspecified
Pred = Lighting / Desk & Table Lamps / Unspecified / Unspecified


Real = Lighting / Lighting Accessories / Lamp Shades / Unspecified
Pred = Lighting / Floor Lamps / Unspecified / Unspecified


Real = Lighting / Desk & Table Lamps / Unspecified / Unspecified
Pred = Lighting / Decorative Lighting / Decorative Lighting / Unspecified


Real = Lighting / Lighting Accessories / Light bulbs / Unspecified
Pred = Lighting / Desk & Table Lamps / Unspecified / Unspecified


Real = Lighting / Floor Lamps / Unspecified / Unspecified
Pred = Lighting / Desk & Table Lamps / Unspecified / Unspecified


Real = Home Textiles / Unspecified / Unspecified / Unspecified
Pred = Decoration / Home Accessories / Other / Doorstoppers


Real = Home Textiles / Unspecified / Unspecified / Unspecified
Pred = Decoration / Home Accessories / Figurines / People


Real = Home Textiles / Unspecified / Unspecified / Unspecified
Pred = Decoration / Home Accessories / Other / Doorstoppers


Real = Home Textiles / Unspecified / Unspecified / Unspecified
Pred = Decoration / Home Accessories / Other / Doorstoppers


Real = Home Textiles / Unspecified / Unspecified / Unspecified
Pred = Decoration / Home Accessories / Other / Doorstoppers


Real = Flowers & Plants / Artificial Flowers / Unspecified / Unspecified
Pred = Flowers & Plants / Unspecified / Unspecified / Unspecified


Real = Flowers & Plants / Artificial Branches / Unspecified / Unspecified
Pred = Flowers & Plants / Artificial Flowers / Unspecified / Unspecified


Real = Flowers & Plants / Artificial Branches / Unspecified / Unspecified
Pred = Flowers & Plants / Artificial Flowers / Unspecified / Unspecified


Real = Flowers & Plants / Artificial Branches / Unspecified / Unspecified
Pred = Flowers & Plants / Artificial Flowers / Unspecified / Unspecified


Reprezentatywność klas - korelacja proprocji klas w danych z błędami do danych testowych¶

  • im wieskza, tym bardziej podobne proprocej klas
In [21]:
generate_ratio_df(errors_df, test_df, "main")
r Pearson Correlation = 0.988
Out[21]:
main ratio_in_errors ratio_in_tests diff
0 Decoration 0.58 0.73 -0.15
1 Furniture 0.16 0.08 0.08
2 Tableware 0.10 0.07 0.03
3 Lighting 0.06 0.05 0.01
4 Home Textiles 0.05 0.05 0.00
5 Flowers & Plants 0.04 0.02 0.02
In [22]:
generate_ratio_df(errors_df, test_df, "sub")
r Pearson Correlation = 0.859
Out[22]:
sub ratio_in_errors ratio_in_tests diff
0 Home Accessories 0.30 0.32 -0.02
1 Candles & Candle Holders 0.03 0.16 -0.13
2 Flower Pots & Vases 0.09 0.14 -0.05
3 Wall Decoration 0.04 0.04 0.00
4 Decoration Storage 0.04 0.03 0.01
5 Unspecified 0.08 0.03 0.05
6 Sofas & Armchairs 0.01 0.03 -0.02
7 Tables 0.09 0.03 0.06
8 Dinnerware 0.02 0.02 0.00
9 Desk & Table Lamps 0.01 0.02 -0.01
10 Cushions 0.00 0.02 -0.02
11 Home Fragrances 0.00 0.02 -0.02
12 Decorative Lighting 0.00 0.02 -0.02
13 Serveware 0.03 0.01 0.02
14 Storage 0.06 0.01 0.05
15 Wine & Bar Accessories 0.02 0.01 0.01
16 Artificial Flowers 0.01 0.01 0.00
17 Table & Kitchen Accessories 0.00 0.01 -0.01
18 Chairs 0.00 0.01 -0.01
19 Lighting Accessories 0.02 0.01 0.01
20 Glassware 0.02 0.01 0.01
21 Mirrors 0.01 0.01 0.00
22 Photo Frames 0.00 0.00 0.00
23 Soft Toys 0.00 0.00 0.00
24 Floor Lamps 0.03 0.00 0.03
25 Clocks 0.02 0.00 0.02
26 Artificial Branches 0.03 0.00 0.03
27 Artificial Trees 0.00 0.00 0.00
28 Ceiling Lamps 0.00 0.00 0.00
29 Blankets & Throws 0.00 0.00 0.00
30 Garden Accessories 0.02 0.00 0.02
31 Rugs 0.00 0.00 0.00
32 Bed Linen 0.00 0.00 0.00
33 Decorative Materials 0.00 0.00 0.00
34 Cutlery 0.00 0.00 0.00
In [23]:
generate_ratio_df(errors_df, test_df, "detail")
r Pearson Correlation = 0.756
Out[23]:
detail ratio_in_errors ratio_in_tests diff
0 Figurines 0.10 0.14 -0.04
1 Tealight Holders 0.03 0.13 -0.10
2 Decorative Objects 0.15 0.12 0.03
3 Vases 0.05 0.10 -0.05
4 Unspecified 0.19 0.09 0.10
... ... ... ... ...
64 Buffets 0.01 0.00 0.01
65 Chests of Drawers 0.01 0.00 0.01
66 Salad Servers 0.00 0.00 0.00
67 Ice Buckets 0.00 0.00 0.00
68 Light bulbs 0.01 0.00 0.01

69 rows × 4 columns

In [24]:
generate_ratio_df(errors_df, test_df, "level4")
r Pearson Correlation = 0.988
Out[24]:
level4 ratio_in_errors ratio_in_tests diff
0 Unspecified 0.70 0.69 0.01
1 Animals 0.05 0.08 -0.03
2 Decorative Trees 0.01 0.05 -0.04
3 Garlands & hangers 0.05 0.04 0.01
4 Objects 0.12 0.03 0.09
5 Fantasy 0.01 0.02 -0.01
6 People 0.02 0.02 0.00
7 Decorative Trays 0.01 0.02 -0.01
8 Decorative Cones 0.01 0.01 0.00
9 Flowers & Plants 0.01 0.01 0.00
10 Wreath 0.00 0.01 -0.01
11 Doorstoppers 0.00 0.00 0.00
12 Buddhas 0.00 0.00 0.00
13 Paperweights 0.00 0.00 0.00
14 Other 0.00 0.00 0.00
15 Decorative Bottles 0.00 0.00 0.00
16 Decorative Letters & Numbers 0.00 0.00 0.00
17 Bookends 0.00 0.00 0.00
18 Abstract 0.00 0.00 0.00